Layered Maps, Choropleth Maps, Map Projections, Animation
This week, we’ll layer points and other geometries on maps, create choropleth maps, learn more about map projections, and create animated plots in ggplot().
alt text
(40 points total)
airports and routes datasets from Lecture 23. Use ggmap to create a map centered on the United States. Add points corresponding to the location of each airport, sized by the number of arriving flights at that airport.library(plyr)
library(dplyr)
library(ggmap)
airports <- read.csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat", header = FALSE)
colnames(airports) <- c("ID", "name", "city", "country", "IATA_FAA", "ICAO",
"lat", "lon", "altitude", "timezone", "DST")
routes <- read.csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat", header=F)
colnames(routes) <- c("airline", "airlineID", "sourceAirport",
"sourceAirportID", "destinationAirport",
"destinationAirportID", "codeshare", "stops", "equipment")
departures <- ddply(routes, .(sourceAirportID), "nrow")
names(departures)[2] <- "flights"
arrivals <- ddply(routes, .(destinationAirportID), "nrow")
names(arrivals)[2] <- "flights"
airportD <- merge(airports, departures, by.x = "ID",
by.y = "sourceAirportID")
airportA <- merge(airports, arrivals, by.x = "ID",
by.y = "destinationAirportID")
map <- get_map(location = "United States", zoom = 4)
## Source : https://maps.googleapis.com/maps/api/staticmap?center=United+States&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=United%20States
ggmap(map) +
geom_point(aes(x = lon, y = lat, size = flights), data = airportA,
alpha = .45) +
labs(x = "Longitude", y = "Latitude") +
ggtitle("Flight Arrival Locations") +
scale_size_continuous(name = "Number of\nArrivals", trans="sqrt",
breaks=c(10, 50, 200, 800))
routes and airports datasets so that you can use geom_segment() to draw a line connecting each airport for each flight listed in the routes dataset. That is, draw a line that connects the departing airport and the arrival airport. Do this only for flights that either depart from or arrive at the Atlanta (ATL) airport.atl_routes <- subset(routes, routes$sourceAirport == "ATL" |
routes$destinationAirport == "ATL")
atl_airports <- merge(atl_routes, airports, by.x = "sourceAirport",
by.y = "IATA_FAA")
atl_airports <- atl_airports[, c("destinationAirport", "lat", "lon",
"timezone")]
colnames(atl_airports) <- c("destinationAirport", "source_lat", "source_lon",
"source_timezone")
atl_airports <- merge(atl_airports, airports, by.x = "destinationAirport",
by.y = "IATA_FAA")
atl_airports <- atl_airports[, c("source_lat", "source_lon", "source_timezone",
"lat", "lon", "timezone")]
colnames(atl_airports) <- c("source_lat", "source_lon", "source_timezone",
"dest_lat", "dest_lon", "dest_timezone")
ggmap(map) +
geom_segment(aes(x = source_lon, y = source_lat, xend = dest_lon,
yend = dest_lat), data = atl_airports, alpha=.15) +
labs(x = "Longitude", y = "Latitude") +
ggtitle("Flights to and from Atlanta")
geom_segment() here for how to do this. Do this only for flights that either depart from or arrive at the Atlanta (ATL) airport.ggmap(map) +
geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
yend = dest_lat), data = atl_airports,
arrow = arrow(length = unit(0.02, "npc")), alpha = .25,
curvature = .4) + coord_cartesian() +
labs(x = "Longitude", y = "Latitude") + ggtitle("Flights to and from Atlanta")
ATL) airport.ggmap(map) +
geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
yend = dest_lat, color = source_timezone - dest_timezone),
data = atl_airports, arrow = arrow(length = unit(0.02, "npc")),
alpha = .25, curvature = .4) + coord_cartesian() +
labs(x = "Longitude", y = "Latitude") + ggtitle("Flights to and from Atlanta") +
scale_colour_gradient(name = "Timezone\nDifference", limits = c(-3, 3),
low = "#FFC60D", high = "#810000")
For more interesting examples and for an in-depth description of ggmap, see the short paper by David Kahle and Hadley Wickham here.
Another good resource is here.
(6 points each; 42 points total)
Choropleth maps are maps in which geographic regions (e.g. countries, states, counties, tracts, etc) are colored by a measured/statistical quantity. We’ll create some choropleth maps here.
Code is provided for the following tasks:
state and county columnsunemp <- read.csv("http://datasets.flowingdata.com/unemployment09.csv",
header = F, stringsAsFactors = F)
names(unemp) <- c("id", "state_fips", "county_fips", "name", "year",
"?", "?", "?", "rate")
unemp$county <- tolower(gsub(" County, [A-Z]{2}", "", unemp$name))
unemp$state <- gsub("^.*([A-Z]{2}).*$", "\\1", unemp$name)
Code is provided for the following tasks:
map_data() for US counties and for US statesstate.abb and state.name objects to add proper abbreviations to a new variable, called state, in the county data.framelibrary(ggmap)
county_df <- map_data("county")
names(county_df) <- c("long", "lat", "group", "order", "state_name", "county")
county_df$state <- state.abb[match(county_df$state_name, tolower(state.name))]
county_df$state_name <- NULL
state_df <- map_data("state")
Code is provided for the following tasks:
county_df and usemp data.frames using the state and county variables.rate_discrete, that partitions the existing rate variable into 9 groups.order variable, so that you can plot the map correctly.choropleth_df <- merge(county_df, unemp, by = c("state", "county"))
choropleth_df <- choropleth_df[order(choropleth_df$order), ]
choropleth_df$rate_discrete <- cut_interval(choropleth_df$rate, 9)
Use ggplot() to create a choropleth map of US counties. The code is started for you below.
ggplot(choropleth_df, aes(long, lat, group = group)) +
geom_polygon(aes(fill = rate_discrete)) +
geom_polygon(data = state_df, col = "white", fill = NA) +
ggtitle("US Unemployment Cholropleth Map") + xlab("Longitude") + ylab("Latitude") +
scale_fill_brewer(palette = "YlGnBu")
Interpret your graph. In what areas of the United States is unemployment highest? In what areas is it lowest? Are there any noticeable geographic trends or patterns? Does anything else stick out to you in the graph?
There is very low unemployment in the center of the United States. Unemployment rates tend to increaase as we move towards the West, East, and South. It seems to be highest in California, Oregon, and Michigan as well as in states north of (and including) Mississippi, Alabama, Georgia, Florida, and South Carolina.
Look at the help file for the coord_map() function. Name at least 5 different map projections that are available to be used, and include the description of each one. (Note: You’ll have to also look at the help documentation for the mapproject() function.)
mercator(): equally spaced straight meridians, conformal, straight compass courses
sinusoidal(): equally spaced parallels, equal-area, same as bonne(0)
cylequalarea(lat0): equally spaced straight meridians, equal-area, true scale on lat0
cylindrical(): central projection on tangent cylinder
rectangular(lat0): equally spaced parallels, equally spaced straight meridians, true scale on lat0
ggplot(choropleth_df, aes(long, lat, group = group)) +
geom_polygon(aes(fill = rate_discrete)) +
geom_polygon(data = state_df, col = "white", fill = NA) +
ggtitle("US Unemployment Cholropleth Map") + xlab("Longitude") + ylab("Latitude") +
scale_fill_brewer(palette = "YlGnBu") +
coord_map("mercator")
ggplot(choropleth_df, aes(long, lat, group = group)) +
geom_polygon(aes(fill = rate_discrete)) +
geom_polygon(data = state_df, col = "white", fill = NA) +
ggtitle("US Unemployment Cholropleth Map") + xlab("Longitude") + ylab("Latitude") +
scale_fill_brewer(palette = "YlGnBu") +
coord_map("sinusoidal")
The mercator projection tends to stretch geographics regions farther away from the equator, making these regions appear to be larger than they actually are. The sinusoidal projection is an “equal area” projection, meaning this distortion of the area of geographic regions is not an issue here. Instead, the lines of latitude are slanted to ensure that all geographic regions appear in the graph to have area proportional to their actual area. The differences are most pronounced in the northern regions of the US, where the distortion from the mercator projection is highest.
(10 points)
Watch this video. What do you like about it from a data visualization perspective (1-3 sentences)? What do you dislike, if anything (1-3 sentences)?
Things I liked about the video were its overall comparison of human lives lost using the people figures, and the separation in the bar graphs for important battles so you could see what proportion of the deaths were from this event. I also enjoyed the interactive feature were you could hover over the bars in the stacked bar chart and see were all the death were coming for. I also enjoyed the real time “we are here” at the end. Overall, a very well done video.
The only thing I didn’t like at the beginning was that most of the explaining was done with audio and not visuals, so harder to pick up on, but that changed as they got into more of the data.